智能论文笔记

HEARTS: Multi-task Fusion of Dense Retrieval and Non-autoregressive Generation for Sponsored Search

Bhargav Dodla , Akash Kumar Mohankumar , Amit Singh

分类：自然语言处理

2022-09-13

将用户搜索查询与广告商实时竞标相关的关键字匹配是赞助搜索中的一个至关重要问题。在文献中，已经探索了两种广泛的方法来解决此问题：（i）在共享空间中学习查询和出价关键字的密集检索（DR），以及（ii）自然语言生成（NLG） - 学会直接生成给定查询的投标关键字。在这项工作中，我们首先对这两种方法进行了实证研究，并表明它们提供了添加剂的补充优势。特别是，从NLG检索到的很大一部分的关键字尚未由DR和反之亦然。然后，我们证明有可能将这两种方法的优势有效地结合在一个模型中。具体而言，我们提出了心脏：一种新型的多任务融合框架，在该框架中，我们共同优化共享编码器以同时执行DR和非自动性NLG。通过对30多个跨越20多种语言的搜索查询进行的广泛实验，我们表明，与使用相同的GPU计算的基线方法相比，心脏检索高质量的出价关键字40.3％。我们还证明，在单个心脏模型上推断与在两种不同的DR和NLG基线模型上推断为2倍计算一样好。此外，我们表明，接受心脏目标训练的DR模型要比接受标准对比度损失功能的训练的模型要好得多。最后，我们表明我们的心目标可以用于除赞助搜索并实现显着绩效提高以外的短文本检索任务。

translated by 谷歌翻译

Identification of Adaptive Driving Style Preference through Implicit Inputs in SAE L2 Vehicles

Zhaobo K. Zheng , Kumar Akash , Teruhisa Misu , Vidya Krishmoorthy , Miaomiao Dong , Yuni Lee , Gaojian Huang

分类：机器学习 | 机器人

2022-09-21

自动化车辆功能最佳接受和舒适性的关键因素是驾驶方式。自动化和驱动程序偏爱的驾驶方式之间的不匹配可以使用户更频繁地接管甚至禁用自动化功能。这项工作建议用多模式信号识别用户驾驶样式偏好，因此该车辆可以以连续自动的方式匹配用户偏好。我们对36名参与者进行了驾驶模拟器研究，并收集了广泛的多模式数据，包括行为，生理和情境数据。这包括眼目光，转向抓地力，驾驶演习，制动和节气门踏板输入以及距踏板的脚距离，瞳孔直径，电流皮肤反应，心率和情境驱动驱动环境。然后，我们建立了机器学习模型来识别首选的驾驶方式，并确认所有模式对于识别用户偏好都很重要。这项工作为自动车辆的隐性自适应驾驶风格铺平了道路。

translated by 谷歌翻译

Combining Gradients and Probabilities for Heterogeneous Approximation of Neural Networks

Elias Trommer , Bernd Waschneck , Akash Kumar

分类：机器学习

2022-08-15

这项工作探讨了对神经网络的异质近似乘数配置的搜索，这些神经网络可产生高精度和低能消耗。我们讨论了添加到准确的神经网络计算中的加性高斯噪声的有效性，作为用于近似乘数行为模拟的替代模型。由加性高斯噪声模型跨越的解决方案空间的连续和微分特性被用作一种启发式，可生成有意义的层稳健性估计，而无需组合优化技术。取而代之的是，在网络训练期间，使用反向传播学习了注入精确计算的噪声量。提出了乘数误差的概率模型，以弥合域之间的间隙。该模型估计了近似乘数误差的标准偏差，将加性高斯噪声空间中的解决方案连接到实际硬件实例。我们的实验表明，对于CIFAR-10数据集上不同的重新网络变体，异质近似和神经网络再培训的组合将乘法的能量降低了70％至79％，而TOP-1精度损失却低于一个百分点。对于更复杂的小型成像网任务，我们的VGG16型号可降低能源消耗53％，前5个精度下降0.5个百分点。我们进一步证明，我们的误差模型可以在高精度的常用添加剂高斯噪声（AGN）模型的背景下预测近似乘数的参数。我们的软件实施可在https://github.com/etrommer/agn-approx下获得。

translated by 谷歌翻译

End-to-End Semi-Supervised Learning for Video Action Detection

Akash Kumar , Yogesh Singh Rawat

分类：计算机视觉

2022-03-08

在这项工作中，我们专注于半监督学习的视频动作检测，该学习既利用标签和未标记的数据。我们提出了一种简单的基于端到端一致性的方法，该方法有效地利用了未标记的数据。视频动作检测需要，行动类预测以及动作的时空定位。因此，我们研究了两种类型的约束，分类一致性和时空的一致性。视频中主要背景和静态区域的存在使得利用时空的一致性进行动作检测使其具有挑战性。为了解决这个问题，我们提出了两个新颖的正规化约束，以实现时空的一致性。 1）时间相干性和2）梯度平滑度。这两个方面都利用视频中的动作的时间连续性，并且被发现有效利用未标记的视频进行动作检测。我们证明了所提出的方法对两个不同的动作检测基准数据集的有效性，即UCF101-24和JHMDB-21。此外，我们还展示了YouTube-VOS上提出的视频对象分割方法的有效性，该方法证明了其概括能力，与最近完全监督的方法相比，提出的方法仅在UCF101-24上仅使用20％的注释来实现竞争性能。在UCF101-24上，与监督方法相比，它分别在0.5 F-MAP和V-MAP时分别提高了 +8.9％和 +11％。

translated by 谷歌翻译

Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow

Kaihong Wang , Kumar Akash , Teruhisa Misu

分类：计算机视觉

2022-01-15

未配对的视频对视频翻译旨在在不需要配对培训数据的情况下将视频翻译在源和目标域之间，从而使其对于实际应用程序更可行。不幸的是，翻译的视频通常会遇到时间和语义不一致。为了解决这个问题，许多现有的作品采用了基于运动估计的时间信息，采用时空一致性约束。然而，运动估计的不准确性导致空间颞一致性的指导质量，从而导致不稳定的翻译。在这项工作中，我们提出了一种新颖的范式，该范式通过将输入视频中的动作与生成的光流合成，而不是估算它们，从而使时空的一致性正常。因此，可以在正则化范式中应用合成运动，以使运动在范围内保持一致，而不会冒出运动估计错误的风险。此后，我们利用了我们的无监督回收和无监督的空间损失，在合成光流提供的伪内观察指导下，以准确地在两个域中实现时空一致性。实验表明，在各种情况下，我们的方法在生成时间和语义一致的视频方面具有最先进的性能。代码可在以下网址获得：https：//github.com/wangkaihong/unsup_recycle_gan/。

translated by 谷歌翻译

The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse

Ritesh Kumar , Enakshi Nandi , Laishram Niranjana Devi , Shyam Ratan , Siddharth Singh , Akash Bhagat , Yogesh Dawer

分类：自然语言处理

2021-11-19

在本文中，我们讨论了用分层，细粒度标记标记不同类型的侵略和“上下文”的分层的多语言数据集的开发。这里，这里，这里由对话线程定义，其中发生特定的评论以及评论对先前注释执行的话语角色的“类型”。在此处讨论的初始数据集（并作为逗号@图标共享任务的一部分提供），包括四种语言的15,000名注释评论 - Meitei，Bangla，Hindi和印度英语 - 从各种社交媒体平台收集作为Youtube，Facebook，Twitter和电报。正如通常在社交媒体网站上，大量这些评论都是多语种的，主要是与英语混合的代码混合。本文给出了用于注释的标签的详细描述以及开发多标签的过程的过程，该方法可用于标记具有各种侵略和偏差的评论，包括性别偏见，宗教不宽容（称为标签中的公共偏见），类/种姓偏见和民族/种族偏见。我们还定义并讨论已用于标记通过评论执行的异常发挥作用的标记的标签，例如攻击，防御等。我们还对数据集的统计分析以及我们的基线实验的结果进行了发展使用DataSet开发的自动攻击识别系统。

translated by 谷歌翻译

e-Inu: Simulating A Quadruped Robot With Emotional Sentience

Abhiruph Chakravarty , Jatin Karthik Tripathy , Sibi Chakkaravarthy S , Aswani Kumar Cherukuri , S. Anitha , Firuz Kamalov , Annapurna Jonnalagadda

分类：机器人 | 机器学习

2023-01-03

Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.

translated by 谷歌翻译

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Santhosh Kumar Ramakrishnan , Ziad Al-Halah , Kristen Grauman

分类：计算机视觉

2023-01-02

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.

translated by 谷歌翻译

Statistical Machine Translation for Indic Languages

Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

分类：自然语言处理

2023-01-02

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译